Alignment of Tandem Repeats with Excision, Duplication, Substitution and Indels (EDSI)
نویسندگان
چکیده
Traditional sequence comparison by alignment applies a mutation model comprising two events, substitutions and indels (insertions or deletions) of single positions (SI). However, modern genetic analysis knows a variety of more complex mutation events (e.g., duplications, excisions and rearrangements), especially regarding DNA. With the ever more DNA sequence data becoming available, the need to accurately compare sequences which have clearly undergone more complicated types of mutational processes is becoming critical. Herein we introduce a new model, where in total four mutational events are considered: excision and duplication of tandem repeats, as well as substitutions and indels of single positions (EDSI). Assuming the EDSI model, we develop a new algorithm for pairwisely aligning and comparing DNA sequences containing tandem repeats. To evaluate our method, we apply it to the spa VNTR (variable number of tandem repeats) of Staphylococcus aureus, a bacterium of great medical importance.
منابع مشابه
Graph-based modeling of tandem repeats improves global multiple sequence alignment
Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but curre...
متن کاملEstimation of the Duplication History under a Stochastic Model for Tandem Repeats
We present a stochastic model for tandem duplication and substitution mutations that can be used to estimate relative mutation rates and the total number of mutations from a single sequence. Important parameters of the model include the probability of a substitution mutation and the probabilities of tandem duplications of various lengths. Our model indicates that if the probability of substitut...
متن کاملIndel seeds for homology search
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguo...
متن کاملSequence turnover and tandem repeats in cis-regulatory modules in drosophila.
The path by which regulatory sequence can change, yet preserve function, is an important open question for both evolution and bioinformatics. The recent sequencing of two additional species of Drosophila plus the wealth of data on gene regulation in the fruit fly provides new means for addressing this question. For regulatory sequences, indels account for more base pairs (bp) of change than sub...
متن کاملA Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences u...
متن کامل